[RegAllocBase] Produce IMPLICIT_DEF instead of COPY undef during cleanupFailedVReg #147392

jrbyrnes · 2025-07-07T20:40:04Z

If the LiveRange hacking in cleanupFailedVReg ends up producing a undef copy, optimize into IMPLICIT_DEF.

This helps simplify some analyses regarding spilling.

Change-Id: I134bc03e7eeb7f5324c1ce7005b96b28d1b01643

llvmbot · 2025-07-07T20:40:37Z

@llvm/pr-subscribers-llvm-regalloc

Author: Jeffrey Byrnes (jrbyrnes)

Changes

Since this would result in spilling an undef register, the spill code is unnecessary.

Full diff: https://github.com/llvm/llvm-project/pull/147392.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/InlineSpiller.cpp (+3)
(added) llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir (+80)

diff --git a/llvm/lib/CodeGen/InlineSpiller.cpp b/llvm/lib/CodeGen/InlineSpiller.cpp
index ee08dc42dce57..6b8b8731e6b56 100644
--- a/llvm/lib/CodeGen/InlineSpiller.cpp
+++ b/llvm/lib/CodeGen/InlineSpiller.cpp
@@ -946,6 +946,9 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
     if (MO.isUse() && !MO.readsReg() && !MO.isTied())
       continue;
 
+    if (MI->isCopy() && MI->getOperand(1).isUndef())
+      continue;
+
     if (MO.isImplicit()) {
       ImpReg = MO.getReg();
       continue;
diff --git a/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
new file mode 100644
index 0000000000000..440753fc88277
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
@@ -0,0 +1,80 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs --start-before=greedy,2 --stop-after=greedy,2 %s -o - | FileCheck %s
+
+# Make sure there's no machine verifier error
+
+# If RA is unable to find a register to allocate, then cleanupFailedVReg will do ad-hoc rewriting and will insert undefs to make the LiveRanges workable.
+# %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3 is an example of such a rewrite / undef. If we were to want to spill %30, we should not be inserting
+# actual spill code, as the source operand is undef.
+# Check that there are no verfier issues with the LiveRange of $vgpr0_vgpr1_vgpr2_vgpr3 / that we do not insert spill code for %30.
+
+
+--- |
+  define void @foo() #0 {
+    ret void
+  }
+
+  attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
+
+...
+
+---
+name:            foo
+tracksRegLiveness: true
+stack:
+  - { id: 0, type: spill-slot, size: 32, alignment: 4 }
+machineFunctionInfo:
+  maxKernArgAlign: 4
+  isEntryFunction: true
+  waveLimiter:     true
+  scratchRSrcReg:  '$sgpr96_sgpr97_sgpr98_sgpr99'
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+  hasSpilledVGPRs: true
+  argumentInfo:
+    privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
+    dispatchPtr:     { reg: '$sgpr4_sgpr5' }
+    kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
+    workGroupIDX:    { reg: '$sgpr8' }
+    privateSegmentWaveByteOffset: { reg: '$sgpr9' }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: foo
+    ; CHECK: INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10 /* regdef */, def %10, 10 /* regdef */, def %1, 10 /* regdef */, def %2, 10 /* regdef */, def $vgpr0_vgpr1_vgpr2_vgpr3, 10 /* regdef */, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    ; CHECK-NEXT: SI_SPILL_AV128_SAVE [[COPY]], %stack.2, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV160_SAVE %2, %stack.1, $sgpr32, 0, implicit $exec :: (store (s160) into %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV256_SAVE %1, %stack.3, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vreg_512 = COPY %10
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY1]], %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY2]], %stack.6, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY [[SI_SPILL_V512_RESTORE]]
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE1:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[SI_SPILL_V512_RESTORE1]], %stack.4, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_AV256_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V256_SAVE [[SI_SPILL_AV256_RESTORE]], %stack.5, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV160_RESTORE:%[0-9]+]]:vreg_160 = SI_SPILL_AV160_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s160) from %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_AV128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV512_RESTORE:%[0-9]+]]:av_512 = SI_SPILL_AV512_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_V256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9 /* reguse */, [[SI_SPILL_AV512_RESTORE]], 9 /* reguse */, [[SI_SPILL_V256_RESTORE]], 9 /* reguse */, [[SI_SPILL_AV160_RESTORE]], 9 /* reguse */, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9 /* reguse */, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    ; CHECK-NEXT: SI_RETURN
+    INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10, def %22:vreg_512, 10, def %25:vreg_256, 10, def %28:vreg_160, 10, def $vgpr0_vgpr1_vgpr2_vgpr3, 10, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    %27:av_160 = COPY %28:vreg_160
+    %24:av_256 = COPY %25:vreg_256
+    SI_SPILL_V512_SAVE %22:vreg_512, %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    %18:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY %18:vreg_512
+    %23:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    %26:vreg_256 = COPY %24:av_256
+    %29:vreg_160 = COPY %27:av_160
+    $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %30:av_128
+    INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9, %23:vreg_512, 9, %26:vreg_256, 9, %29:vreg_160, 9, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    SI_RETURN
+
+...

llvmbot · 2025-07-07T20:40:38Z

@llvm/pr-subscribers-backend-amdgpu

Author: Jeffrey Byrnes (jrbyrnes)

Changes

Since this would result in spilling an undef register, the spill code is unnecessary.

Full diff: https://github.com/llvm/llvm-project/pull/147392.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/InlineSpiller.cpp (+3)
(added) llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir (+80)

diff --git a/llvm/lib/CodeGen/InlineSpiller.cpp b/llvm/lib/CodeGen/InlineSpiller.cpp
index ee08dc42dce57..6b8b8731e6b56 100644
--- a/llvm/lib/CodeGen/InlineSpiller.cpp
+++ b/llvm/lib/CodeGen/InlineSpiller.cpp
@@ -946,6 +946,9 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
     if (MO.isUse() && !MO.readsReg() && !MO.isTied())
       continue;
 
+    if (MI->isCopy() && MI->getOperand(1).isUndef())
+      continue;
+
     if (MO.isImplicit()) {
       ImpReg = MO.getReg();
       continue;
diff --git a/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
new file mode 100644
index 0000000000000..440753fc88277
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir
@@ -0,0 +1,80 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+# RUN: llc -mtriple=amdgcn -mcpu=gfx908 -verify-machineinstrs --start-before=greedy,2 --stop-after=greedy,2 %s -o - | FileCheck %s
+
+# Make sure there's no machine verifier error
+
+# If RA is unable to find a register to allocate, then cleanupFailedVReg will do ad-hoc rewriting and will insert undefs to make the LiveRanges workable.
+# %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3 is an example of such a rewrite / undef. If we were to want to spill %30, we should not be inserting
+# actual spill code, as the source operand is undef.
+# Check that there are no verfier issues with the LiveRange of $vgpr0_vgpr1_vgpr2_vgpr3 / that we do not insert spill code for %30.
+
+
+--- |
+  define void @foo() #0 {
+    ret void
+  }
+
+  attributes #0 = { "amdgpu-waves-per-eu"="8,8" }
+
+...
+
+---
+name:            foo
+tracksRegLiveness: true
+stack:
+  - { id: 0, type: spill-slot, size: 32, alignment: 4 }
+machineFunctionInfo:
+  maxKernArgAlign: 4
+  isEntryFunction: true
+  waveLimiter:     true
+  scratchRSrcReg:  '$sgpr96_sgpr97_sgpr98_sgpr99'
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+  hasSpilledVGPRs: true
+  argumentInfo:
+    privateSegmentBuffer: { reg: '$sgpr0_sgpr1_sgpr2_sgpr3' }
+    dispatchPtr:     { reg: '$sgpr4_sgpr5' }
+    kernargSegmentPtr: { reg: '$sgpr6_sgpr7' }
+    workGroupIDX:    { reg: '$sgpr8' }
+    privateSegmentWaveByteOffset: { reg: '$sgpr9' }
+body:             |
+  bb.0:
+    ; CHECK-LABEL: name: foo
+    ; CHECK: INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10 /* regdef */, def %10, 10 /* regdef */, def %1, 10 /* regdef */, def %2, 10 /* regdef */, def $vgpr0_vgpr1_vgpr2_vgpr3, 10 /* regdef */, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: [[COPY:%[0-9]+]]:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    ; CHECK-NEXT: SI_SPILL_AV128_SAVE [[COPY]], %stack.2, $sgpr32, 0, implicit $exec :: (store (s128) into %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV160_SAVE %2, %stack.1, $sgpr32, 0, implicit $exec :: (store (s160) into %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_AV256_SAVE %1, %stack.3, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY1:%[0-9]+]]:vreg_512 = COPY %10
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY1]], %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: [[COPY2:%[0-9]+]]:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[COPY2]], %stack.6, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10 /* regdef */, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.6, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.6, align 4, addrspace 5)
+    ; CHECK-NEXT: $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY [[SI_SPILL_V512_RESTORE]]
+    ; CHECK-NEXT: [[SI_SPILL_V512_RESTORE1:%[0-9]+]]:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V512_SAVE [[SI_SPILL_V512_RESTORE1]], %stack.4, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_AV256_RESTORE %stack.3, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.3, align 4, addrspace 5)
+    ; CHECK-NEXT: SI_SPILL_V256_SAVE [[SI_SPILL_AV256_RESTORE]], %stack.5, $sgpr32, 0, implicit $exec :: (store (s256) into %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV160_RESTORE:%[0-9]+]]:vreg_160 = SI_SPILL_AV160_RESTORE %stack.1, $sgpr32, 0, implicit $exec :: (load (s160) from %stack.1, align 4, addrspace 5)
+    ; CHECK-NEXT: $vgpr0_vgpr1_vgpr2_vgpr3 = SI_SPILL_AV128_RESTORE %stack.2, $sgpr32, 0, implicit $exec :: (load (s128) from %stack.2, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_AV512_RESTORE:%[0-9]+]]:av_512 = SI_SPILL_AV512_RESTORE %stack.4, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.4, align 4, addrspace 5)
+    ; CHECK-NEXT: [[SI_SPILL_V256_RESTORE:%[0-9]+]]:vreg_256 = SI_SPILL_V256_RESTORE %stack.5, $sgpr32, 0, implicit $exec :: (load (s256) from %stack.5, align 4, addrspace 5)
+    ; CHECK-NEXT: INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9 /* reguse */, [[SI_SPILL_AV512_RESTORE]], 9 /* reguse */, [[SI_SPILL_V256_RESTORE]], 9 /* reguse */, [[SI_SPILL_AV160_RESTORE]], 9 /* reguse */, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9 /* reguse */, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    ; CHECK-NEXT: SI_RETURN
+    INLINEASM &"; def $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 10, def %22:vreg_512, 10, def %25:vreg_256, 10, def %28:vreg_160, 10, def $vgpr0_vgpr1_vgpr2_vgpr3, 10, implicit-def $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    %30:av_128 = COPY undef $vgpr0_vgpr1_vgpr2_vgpr3
+    %27:av_160 = COPY %28:vreg_160
+    %24:av_256 = COPY %25:vreg_256
+    SI_SPILL_V512_SAVE %22:vreg_512, %stack.0, $sgpr32, 0, implicit $exec :: (store (s512) into %stack.0, align 4, addrspace 5)
+    %18:vreg_512 = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
+    INLINEASM &"; clobber", 1 /* sideeffect attdialect */, 10, implicit-def early-clobber $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17_vgpr18_vgpr19_vgpr20_vgpr21_vgpr22_vgpr23_vgpr24_vgpr25_vgpr26_vgpr27_vgpr28_vgpr29_vgpr30_vgpr31
+    $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16 = COPY %18:vreg_512
+    %23:vreg_512 = SI_SPILL_V512_RESTORE %stack.0, $sgpr32, 0, implicit $exec :: (load (s512) from %stack.0, align 4, addrspace 5)
+    %26:vreg_256 = COPY %24:av_256
+    %29:vreg_160 = COPY %27:av_160
+    $vgpr0_vgpr1_vgpr2_vgpr3 = COPY %30:av_128
+    INLINEASM &"; use $0 $1 $2 $3 $4", 1 /* sideeffect attdialect */, 9, %23:vreg_512, 9, %26:vreg_256, 9, %29:vreg_160, 9, undef $vgpr0_vgpr1_vgpr2_vgpr3, 9, $agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15_agpr16
+    SI_RETURN
+
+...

llvm/test/CodeGen/AMDGPU/regalloc-undef-copy-fold.mir

arsenm · 2025-07-08T00:27:52Z

llvm/lib/CodeGen/InlineSpiller.cpp

@@ -946,6 +946,9 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
    if (MO.isUse() && !MO.readsReg() && !MO.isTied())
      continue;

+    if (MI->isCopy() && MI->getOperand(1).isUndef())


This shouldn't have gotten this far in the first place? We have a number of places already deleting copies of undef

Oh, I now see in the other PR, this is from hacking up the failed ranges. Can you add the additional context to the commit message? There also might still be a better way to do this at the failure point

Copy shouldn't be special, the undef read case is already handled above here by the !readsReg check?

Oh, I now see in the other PR, this is from hacking up the failed ranges.

Yeah -- the test case is probably misleading, since it's using the MIR mid RA (i.e. after cleanedFailedVReg has created its version of the liveness)

There also might still be a better way to do this at the failure point

I guess instead of creating undef copies we can just create implicit_defs. Seems like sort of an ad-hoc requirement, but cleanupFailedVReg seems like a hack anyways.

undef read case is already handled above here by the !readsReg check?

Do you mean the dest of an undef copy should be !readsReg ? This is checking conditions of the register we plan to spill / fold, so, in this case, the dest of the copy.

In the latest, I've changed it such that cleanupFailedVreg produces IMPLICIT_DEF instead of undef copies

I mean you are checking if the source operand isUndef, which for a use is equivalent to the MO.readsReg check above

Change-Id: Icfd4d568221b9d2425ebd7568c06b48c6a8ecdc9

jrbyrnes · 2025-07-09T19:27:12Z

llvm/lib/CodeGen/RegAllocBase.cpp

+    assert(UndefCopy->isCopy() && UndefCopy->getNumOperands() == 2);
+    const MCInstrDesc &Desc = TII->get(TargetOpcode::IMPLICIT_DEF);
+    UndefCopy->removeOperand(1);
+    UndefCopy->setDesc(Desc);


I found this to be the cleanest implementation for dealing with COPY bundles

arsenm · 2025-07-10T08:43:33Z

Shouldn't describe this as "optimizing" the undef copies. I also do not understand why copy would be special. If we were to handle copy-like target instructions, this could fail the same way

arsenm · 2025-07-10T08:35:03Z

llvm/lib/CodeGen/RegAllocBase.cpp

          LIS->removeAllRegUnitsForPhysReg(MO.getReg());
        }
      }
    }
  }

+  // If we have produced an undef copy, convert to IMPLICIT_DEF.
+  for (MachineInstr *UndefCopy : UndefCopies) {
+    assert(UndefCopy->isCopy() && UndefCopy->getNumOperands() == 2);


This assert probably doesn't hold in general

arsenm · 2025-07-10T08:40:44Z

llvm/lib/CodeGen/InlineSpiller.cpp

@@ -946,6 +946,9 @@ foldMemoryOperand(ArrayRef<std::pair<MachineInstr *, unsigned>> Ops,
    if (MO.isUse() && !MO.readsReg() && !MO.isTied())
      continue;

+    if (MI->isCopy() && MI->getOperand(1).isUndef())


I mean you are checking if the source operand isUndef, which for a use is equivalent to the MO.readsReg check above

arsenm · 2025-07-10T09:50:15Z

I think this is a more correct fix:

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 8b82deb2a9d8..a36c3e03e889 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -476,6 +476,9 @@ static const TargetRegisterClass *canFoldCopy(const MachineInstr &MI,
   if (FoldOp.getSubReg() || LiveOp.getSubReg())
     return nullptr;
 
+  if (LiveOp.isUndef())
+    return nullptr;
+
   Register FoldReg = FoldOp.getReg();
   Register LiveReg = LiveOp.getReg();

arsenm · 2025-07-10T09:55:28Z

The problem isn't the undef copy, it's the fact that storeRegToStack slot doesn't propagate the undef flag from the source when folding the copy. Disabling the optimization works, but we alternatively could pass through the undef flag and let the target produce the undef spill. Or, not emit the spill in the first place (that is complicated since all this code expects a single instruction be produced. I guess we could instead insert an undef KILL in place of a store)

arsenm · 2025-07-10T09:59:11Z

Option 2:

diff --git a/llvm/lib/CodeGen/TargetInstrInfo.cpp b/llvm/lib/CodeGen/TargetInstrInfo.cpp
index 8b82deb2a9d8..8c23dff565ab 100644
--- a/llvm/lib/CodeGen/TargetInstrInfo.cpp
+++ b/llvm/lib/CodeGen/TargetInstrInfo.cpp
@@ -773,10 +773,14 @@ MachineInstr *TargetInstrInfo::foldMemoryOperand(MachineInstr &MI,
   const MachineOperand &MO = MI.getOperand(1 - Ops[0]);
   MachineBasicBlock::iterator Pos = MI;
 
-  if (Flags == MachineMemOperand::MOStore)
-    storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI,
-                        Register());
-  else
+  if (Flags == MachineMemOperand::MOStore) {
+    if (MO.isUndef()) {
+      BuildMI(*MBB, Pos, MI.getDebugLoc(), get(TargetOpcode::KILL)).add(MO);
+    } else {
+      storeRegToStackSlot(*MBB, Pos, MO.getReg(), MO.isKill(), FI, RC, TRI,
+                          Register());
+    }
+  } else
     loadRegFromStackSlot(*MBB, Pos, MO.getReg(), FI, RC, TRI, Register());
   return &*--Pos;
 }

[InlineSpiller] Do not fold undef copies

fb9754d

Change-Id: I134bc03e7eeb7f5324c1ce7005b96b28d1b01643

jrbyrnes requested a review from arsenm July 7, 2025 20:40

llvmbot added backend:AMDGPU llvm:codegen llvm:regalloc labels Jul 7, 2025

jrbyrnes mentioned this pull request Jul 7, 2025

[AMDGPU] Allocate AVRegClass last #146606

Open

arsenm reviewed Jul 8, 2025

View reviewed changes

Use IMPLICIT_DEF instead of undef COPYs

adff47f

Change-Id: Icfd4d568221b9d2425ebd7568c06b48c6a8ecdc9

jrbyrnes commented Jul 9, 2025

View reviewed changes

jrbyrnes changed the title ~~[InlineSpiller] Do not fold undef copies~~ [RegAllocBase] Produce IMPLICIT_DEF instead of COPY undef during cleanupFailedVReg Jul 9, 2025

arsenm reviewed Jul 10, 2025

View reviewed changes

[RegAllocBase] Produce IMPLICIT_DEF instead of COPY undef during cleanupFailedVReg #147392

Are you sure you want to change the base?

[RegAllocBase] Produce IMPLICIT_DEF instead of COPY undef during cleanupFailedVReg #147392

Conversation

jrbyrnes commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

llvmbot commented Jul 7, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

arsenm commented Jul 10, 2025

Uh oh!

Uh oh!

jrbyrnes commented Jul 7, 2025 •

edited

Loading